Policy-Gradient Methods for Planning
نویسنده
چکیده
Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning — in the form of a policy-gradient method — to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach is to construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.
منابع مشابه
Gradient-based Reinforcement Planning in Policy-Search Methods
We introduce a learning method called “gradient-based reinforcement planning” (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with n...
متن کاملGradient - based Reinforcement Planning in Policy - Search
We introduce a learning method called “gradient-based reinforcement planning” (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with n...
متن کاملar X iv : c s . A I / 01 11 06 0 v 1 28 N ov 2 00 1 Gradient - based Reinforcement Planning in Policy
We introduce a learning method called “gradient-based reinforcement planning” (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with n...
متن کاملEquilibrium Policy Gradients for Spatiotemporal Planning
In spatiotemporal planning, agents choose actions at multiple locations in space over some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actions for thousands of locations in the forest each year. The actions at each location could include harvesting trees, treating trees against disease and pests, or doin...
متن کاملSimulation Methods for Uncertain Decision-Theoretic Planning
Experience based reinforcement learning (RL) systems are known to be useful for dealing with domains that are a priori unknown. We believe that experience based methods may also be useful when the model is uncertain (or even completely known). In this case experience is gained by simulating the uncertain model. This paper explores a simple way to allow experience based RL systems to cope with u...
متن کامل